LitLin 18_4 423-447 fqh009 FIN
نویسنده
چکیده
Large, real world, data sets have been investigated in the context of Authorship Attribution of real world documents. Ngram measures can be used to accurately assign authorship for long documents such as novels. A number of 5 (authors) 5 (movies) arrays of movie reviews were acquired from the Internet Movie Database. Both ngram and naive Bayes classifiers were used to classify along both the authorship and topic (movie) axes. Both approaches yielded similar results, and authorship was as accurately detected, or more accurately detected, than topic. Part of speech tagging and function-word lists were used to investigate the influence of structure on classification tasks on documents with meaning removed but grammatical structure intact. LitLin 18_4 423-447 fqh009 FIN 28/1/04 8:02 am Page 423
منابع مشابه
LitLin 18_4 361-378 fqh002 FIN
This paper presents the newly released Lancaster Corpus of Mandarin Chinese (LCMC), a Chinese match for the FLOB and Frown corpora of British and American English. We first discuss the major decisions we took when building the corpus. These relate to sampling, text collection, mark-up, and annotation. Following from this we use the corpus to study aspect marking in Chinese and British/American ...
متن کاملLitLin 19_4 453-475 fqh034 FIN
Delta, a simple measure of the difference between two texts, has been proposed by John F. Burrows as a tool in authorship attribution problems, particularly in large ‘open’ problems in which conventional methods of attribution are not able to limit the claimants effectively. This paper tests Delta’s effectiveness and accuracy, and shows that it works nearly as well on prose as it does on poetry...
متن کاملMonitoring Winter and Summer Abundance of Cetaceans in the Pelagos Sanctuary (Northwestern Mediterranean Sea) Through Aerial Surveys
Systematic long-term monitoring of abundance is essential to inform conservation measures and evaluate their effectiveness. To instigate such work in the Pelagos Sanctuary in the Mediterranean, two aerial surveys were conducted in winter and summer 2009. A total of 467 (131 in winter, 336 in summer) sightings of 7 species was made. Sample sizes were sufficient to estimate abundance of fin whale...
متن کاملReactor for Producing Large Particles of Materials from Gases
3,371,997 3/1968 Jordan et al ........................ 423/450 4,013,420 3/1977 Cheng ................................. 422/156 4,084,024 4/1978 Schumacher ....................... 423/350 4,154,870 5/1979 Wakefield ....................... 423/350 X 4,241,022 12/1980 Kraus et al. ......................... 422/156 4,292,344 7/1981 McHale .......................... 423/349 X 4,314,525 2/1982 Hsu e...
متن کاملMicroRNA-423 promotes cell growth and regulates G(1)/S transition by targeting p21Cip1/Waf1 in hepatocellular carcinoma.
MicroRNAs (miRNAs) are small non-coding RNA molecules that are often located in genomic breakpoint regions and can act as oncogenes or tumor suppressor genes in human cancer. Our previous study showed that microRNA-423 (miR-423), which localized to the frequently amplified region of chromosome 17q11, was upregulated in hepatocellular carcinoma (HCC). However, the potential functions and exact m...
متن کامل